1 Markdown basics

Markdown is one of the world’s most popular markup languages used in data science. Both R Markdown and Jupyter Notebooks use Markdown to provide an unified authoring framework for data science, combining code (R, Python, SQL,…), its results and commentary. The documents are fully reproducible and support dozens of output formats, like PDFs, Word files, slideshows, dashboards and more.

R Markdown Notebook

R Markdown Notebook

However, using Markdown doesn’t mean that you can’t also use Hypertext Markup Language (HTML). You can add HTML tags to any Markdown file.

According to Wickham & Grolemund (2016), Markdown files are designed to be used in three ways:

  1. For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.

  2. For collaborating with other data scientists, who are interested in both your conclusions, and how you reached them (i.e. the code).

  3. As an environment in which to do data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.


Learn the most important basics of Markdown in this excellent interactive “60 Seconds Markdown Tutorial”. There are many options to discover - for example, this link will bring you back to the top of the page. To get an overview about the various output formats of RMarkdown-documents, watch this short video from RStudio: “What is RMarkdown?”.

Furthermore, you can add your own Cascading Style Sheets (CSS) to change the style of your HTML document (CSS describes how HTML elements should be displayed) by using the css option in YAML (see section YAML).

1.1 Integrate code and parameters

R Markdown documents can include one or more global parameters whose values can be set when you render the report. For example, the code below uses a country and year parameter that determines which country to filter.

africa_07 <-  
  gapminder %>% 
  filter(continent == params$country) %>% # use parameter
  group_by(continent, year) %>% 
  summarize(mean = round(mean(lifeExp),2)) %>% 
  filter(year == params$year) %>% # use parameter
  pull(mean) # obtain result as number

In R Markdown it is also easy to integrate the results of R code in text elements. In particular, we can perform a data analysis like the one above and integrate the corresponding result (stored in africa_07) in Markdown comments. Instead of actually typing the result, we use the code `r africa_07` (read this post to learn how to display R code snippets in Markdown).

This code:

  • The average life expectancy in Africa equals `r africa_07` years in 2007

renders to:

  • The average life expectancy in Africa equals 54.81 years in 2007.

1.2 Integrate tabs

You can organize content using tabs by applying the {.tabset} class attribute to headers within a document. This will cause all sub-headers of the header with the .tabset attribute to appear within tabs rather than as standalone sections (learn more about the usage of tabs):

1.2.1 Tab 1

1.2.2 Tab 2

1.2.3 Tab 3

1.3 YAML metadata

To create HTML documents from R Markdown, you first need to specify the html_document output format in the YAML metadata at the top of your document.

YAML (a recursive acronym for “YAML Ain’t Markup Language”) is a human-readable data-serialization language which is commonly used for configuration files and in applications where data is being stored or transmitted.

You can find an overview of all the YAML-options for R Markdown in the excellent book “R Markdown: The Definitive Guide” (2019) from Yihui Xie, J. J. Allaire and Garrett Grolemund.

YAML metadata for this document:

---
title: "Write Reports in R Markdown"
author: "Prof. Dr. Jan Kirenz, HdM Stuttgart"
output:
 html_document: 
  css: style.css # define your own css
  df_print: paged #  tables are printed as HTML tables 
  highlight: default # syntax highlighting style 
  number_sections: yes # numbering of sections
  theme: paper # style option
  fig_height: 4 # figure height
  fig_width: 10 # figure width
  toc: yes # table of content
  toc_float: 
    collapsed: false # show full toc
    smooth_scroll: true # toc scrolling behavior
  includes:
    after_body: footer.html # include footer
---

2 Code Chunks

2.1 Global options

If you want to use data or packages in multiple code chunks, it is good practice to load them once in a code chunk called setup right below the YAML-options. Furthermore, if a certain option needs to be frequently set to a value in multiple code chunks, you can consider setting it globally in the setup code chunk. To set global options that apply to every chunk in your file, call knitr::opts_chunk$set in a code chunk. Knitr will treat each option that you pass to knitr::opts_chunk$set as a global default that can be overwritten in individual chunk headers.

R setup code chunk of this document:

{r setup, include=FALSE}

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
library(tidyverse)
library(gapminder)
library(plotly)

2.2 Chunk Options

Chunk output can be customized with knitr options, arguments set in the {} of a chunk header:

  • include = FALSE prevents code and results from appearing in the finished file. However, R Markdown still runs the code in the chunk, and the results can be used by other chunks.

  • echo = FALSE prevents code, but not the results from appearing in the finished file.

  • eval = FALSE prevents code from running and only displays the code in a knitted document.

  • message = FALSE prevents messages that are generated by code from appearing in the finished file.

  • warning = FALSE prevents warnings that are generated by code from appearing in the finished file.

3 Tables

3.1 df_print

Notice that we already are able to create enhanced HTML tables via our df_print option in the YAML options:

(life_exp_07 <- gapminder %>% 
  filter(year==2007) %>% 
  arrange(desc(lifeExp)))

3.2 KableExtra

You can also use the package kableExtra to build HTML optimized tables and manipulate table styles. It imports the pipe %>% symbol and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with ggplot2 and plotly:

library(kableExtra)

kable(head(life_exp_07, 6)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) 
country continent year lifeExp pop gdpPercap
Japan Asia 2007 82.603 127467972 31656.07
Hong Kong, China Asia 2007 82.208 6980412 39724.98
Iceland Europe 2007 81.757 301931 36180.79
Switzerland Europe 2007 81.701 7554661 37506.42
Australia Oceania 2007 81.235 20434176 34435.37
Spain Europe 2007 80.941 40448191 28821.06

3.3 DT DataTables

The R package DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages and DataTables provides filtering, pagination, sorting, and many other features in the tables. See this DT-Table documentation for an overview of the different options.

library(DT)

datatable(head(life_exp_07, 6),
          rownames = FALSE,
          filter = "top",
          colnames = c('Country', 
                       'Continent', 
                       'Year', 
                       'Life Expectancy', 
                       'Population',
                       'GDP per Capita'),
           caption = 'Table 1: Gapminder data overview')

3.4 gt

The gt package is all about making it simple to produce nice-looking display tables:

library(gt)

set.seed(123)

life_exp_07 %>% 
  slice_sample(n=10) %>% 
  gt()
country continent year lifeExp pop gdpPercap
Singapore Asia 2007 79.972 4553009 47143.1796
Malaysia Asia 2007 74.241 24821286 12451.6558
Burkina Faso Africa 2007 52.295 14326203 1217.0330
Panama Americas 2007 75.537 3242173 9809.1856
Swaziland Africa 2007 39.613 1133066 4513.4806
Zambia Africa 2007 42.384 11746035 1271.2116
Comoros Africa 2007 65.152 710960 986.1479
India Asia 2007 64.698 1110396331 2452.2104
Afghanistan Asia 2007 43.828 31889923 974.5803
Mauritania Africa 2007 64.164 3270065 1803.1515

Same table with some adjustments:

set.seed(123)


life_exp_07 %>% 
  slice_sample(n=10) %>% 
  group_by(continent) %>% 
  gt() %>%
  tab_header(
    title = "Gapminder data overview",
    subtitle = "Data overview with the gt package"
  ) %>% 
   tab_source_note(
    source_note = "Source: Gapminder"
  ) %>%
  fmt_currency(
    columns = vars(gdpPercap),
    currency = "USD",
    decimals = 0
    ) %>% 
  fmt_number(
    columns = vars(lifeExp),
    decimals = 2 
    ) %>% 
  fmt_number(
    columns = vars(pop),
    decimals = 2) 
Gapminder data overview
Data overview with the gt package
country year lifeExp pop gdpPercap
Asia
Singapore 2007 79.97 4,553,009.00 $47,143
Malaysia 2007 74.24 24,821,286.00 $12,452
India 2007 64.70 1,110,396,331.00 $2,452
Afghanistan 2007 43.83 31,889,923.00 $975
Africa
Burkina Faso 2007 52.30 14,326,203.00 $1,217
Swaziland 2007 39.61 1,133,066.00 $4,513
Zambia 2007 42.38 11,746,035.00 $1,271
Comoros 2007 65.15 710,960.00 $986
Mauritania 2007 64.16 3,270,065.00 $1,803
Americas
Panama 2007 75.54 3,242,173.00 $9,809
Source: Gapminder

4 Plots

4.1 ggplot2

ggplot2 is a system for declaratively creating graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details (ggplot2 documentation)

## data preparation
gap_continent <- gapminder %>% 
  group_by(continent, year) %>% 
  summarize(mean = round(mean(lifeExp),2)) 

## create plot
p <- ggplot(gap_continent, aes(year, mean, color = continent)) +
  geom_line() +
  theme_classic() +
  ggtitle("Average Life Expectancy") +
  theme(axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.text.x = element_text(angle = 45, hjust = 1),
  legend.title=element_blank())

## display plot
p

4.2 Plotly

Plotly’s R graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, and 3D (WebGL based) charts.

library(plotly)

ggplotly(p)

5 Leaflet Maps

Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB. For detailed information, visit the package documentation.

Once installed, you can use this package at the R console, within R Markdown documents, and within Shiny applications. If you like to implement leaflet in shiny, review this tutorial

5.1 Basic map

leaflet() %>% 
  setView(lng = 9.102360, lat = 48.740760, zoom = 17) %>% 
  addTiles() 

5.2 Map with pop up

content <- paste(
  sep = "<br/>",
  "<b><a href='https://www.hdm-stuttgart.de'>HdM Stuttgart</a></b>",
  "Nobelstraße 8",
  "70569 Stuttgart"
  )

leaflet() %>% 
  setView(lng = 9.102360, lat = 48.740760, zoom = 17) %>% 
  addTiles() %>%
  addPopups(9.101470, 48.741460, content,
    options = popupOptions(closeButton = FALSE)
  )

5.3 ggmap

ggmap is an R package that makes it easy to retrieve raster map tiles from popular online mapping services like Google Maps and Stamen Maps and plot them using the ggplot2 framework:

library("ggmap")

# data
us <- c(left = -125, bottom = 25.75, right = -67, top = 49)

get_stamenmap(us, zoom = 5, maptype = "toner-lite") %>% 
  ggmap()

library(purrr)

# define function
`%not_in%` <- purrr::negate(`%in%`)

# prepare data
violent_crimes <- crime %>% 
  filter(
    offense %not_in% c("auto theft", "theft", "burglary")
    -95.39681 <= lon & lon <= -95.34188,
     29.73631 <= lat & lat <=  29.78400
  ) %>% 
  mutate(
    offense = fct_drop(offense),
    offense = fct_relevel(offense, c("robbery", "aggravated assault", "rape", "murder"))
  )

Plot data:

qmplot(lon, lat, data = violent_crimes, 
       maptype = "toner-background", 
       color = offense) + 
  facet_wrap(~ offense)

Alternative plot:

qmplot(lon, lat, data = violent_crimes, geom = "blank", 
  zoom = 14, maptype = "toner-background", darken = .7, legend = "topleft"
) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", alpha = .3, color = NA) +
  scale_fill_gradient2("Robbery\nPropensity", low = "white", mid = "yellow", high = "red", midpoint = 650)

 

© Jan Kirenz | Made with R Markdown

HdM Stuttgart